A Polish mathematician explains the behind-the-scenes of the AI exam. "It took 13 pages to solve just one problem."

As part of the FrontierMath project, researchers developed a mathematics exam to compare AI models, creating a set of problems that no mathematician could solve on their own. Project participant Dr. Bartosz Naskręcki incorporated 15 years of research into a single problem. Currently, AI only solves a few of these questions.

photo: SIRITAT TECHAPHALOKUL // Shutterstock

Over the past six months, leading labs like Google DeepMind and OpenAI have released AI models that easily handle math problems at the level of a high school leaving exam. The existing benchmarks used to assess the models' mathematical abilities became useless. A new, much more challenging exam for LLM models was needed. Thus, the FrontierMath project was born, co-created by Dr. Naskręcki from Adam Mickiewicz University.

The project, coordinated by Epoch AI, has several difficulty levels. Dr. Naskręcki co-created the most challenging one, Tier 4. Current AI models are only capable of solving four of the 50 problems presented in the program, spanning five areas of mathematics.

I was invited to prepare a task. The answer was to be a very large number, so that the model couldn't accidentally guess it. I put all my expert knowledge, accumulated during all my years of study and work, into this task," the mathematician told PAP.

He explained that he was supposed to propose a completely new problem, the solution of which couldn't be found online. "It's essentially my buried scientific work. The documented solution took up 13 pages of dense, mathematical text," explained Dr. Naskręcki.

And the difficulty level of each of the 50 problems is equally high. According to Dr. Naskręcki, an expert with a doctorate—in a given area of mathematics—would need at least a month to even figure out how to approach the solution.

"I don't think there's a mathematician in the world who could solve all 50 problems in this set," he added.

How did this "genius exam" come about? Thirty experts from around the world met for two days in Berkeley. In small groups, divided by topic (number theory, topology, combinatorics, mathematical analysis, algebraic geometry), they tested sections of problems on the most powerful AI models (in incognito mode, so the models couldn't memorize them) . They also worked on the problems to make them even more challenging. Many proposed problems were rejected because the models were too quick to identify the correct answer. Ultimately, they created 50 super-difficult challenges.

Now, AI labs that want to test their models can connect to the Epoch AI infrastructure and run the test under controlled conditions. Each model being tested is assigned certain limits—say, to solve a single problem, it can run for three hours and consume a million tokens (the "building blocks" of text from which the AI builds its understanding and answers).

So far, the best models have solved only a few of these tasks. Dr. Naskręcki, however, predicts that in just two or three years, AI will "saturate" this benchmark—providing correct answers to most of the questions. "And then we'll be able to say we have a model that's a truly good mathematician," the researcher believes.

However, he points out a key limitation: AI is "brilliant at clever combinations" and combining existing knowledge, but it can't create new concepts. "No current model will figure out how to prove the Riemann hypothesis. So if models solve all the problems we've prepared, the last domain left for mathematicians will be coming up with new, crazy mathematical ideas," the scientist assessed.

In his opinion, the development of AI is "a hammer that hits us over the head" and forces a revolution in thinking about work and education.

"We must abandon the Prussian school model, which produced obedient soldiers who would obey every order. Now we need people who can think independently, take risks, and build something new," he emphasized.

In his opinion, what's becoming crucial is so-called "fluid intelligence"—the ability to creatively solve problems. As well as thinking "slowly," not "fastly." Machines still don't possess this ability.

Dr. Naskręcki believes that a career in science still has merit, but its nature is changing. "There will be no more cutting corners and adding details to existing theories. Mathematics will return to its roots: it will involve asking bold questions and proposing unconventional solutions," believes the Adam Mickiewicz University researcher.

He added that our advantage over AI remains unique experiences—taking a walk, reading a book, viewing a play. It's from connections that occur in non-obvious areas that ideas emerge that AI has no access to. Therefore, according to the researcher, in the new reality, our greatest value will not be the correct execution of routine tasks, but rather the ability to ask questions and generate original ideas.

Ludwik Tomal (PAP)

lt/ zan/ mow/

bankier.pl

A Polish mathematician explains the behind-the-scenes of the AI exam. "It took 13 pages to solve just one problem."

photo: SIRITAT TECHAPHALOKUL // Shutterstock

I was invited to prepare a task. The answer was to be a very large number, so that the model couldn't accidentally guess it. I put all my expert knowledge, accumulated during all my years of study and work, into this task," the mathematician told PAP.

"I don't think there's a mathematician in the world who could solve all 50 problems in this set," he added.

In his opinion, the development of AI is "a hammer that hits us over the head" and forces a revolution in thinking about work and education.

Ludwik Tomal (PAP)

lt/ zan/ mow/

bankier.pl

A Polish mathematician explains the behind-the-scenes of the AI exam. "It took 13 pages to solve just one problem."

Similar News

A Polish mathematician explains the behind-the-scenes of the AI exam. "It took 13 pages to solve just one problem."

Similar News

Russian troops were overwhelmed by the abundance on the outskirts of Kiev. Hence the cruelty?

Specifically for the Polish Army, WB Group announces a drone revolution on land and sea.

A collapse in payouts. This hasn't happened in over four years.

They want to extract significant amounts of an extremely important raw material. This will change the market.

A daily travel record was broken. An airport in Turkey handled 230,000 and two passengers.

A Polish mathematician explains the behind-the-scenes of the AI ​​exam. "It took 13 pages to solve just one problem."

Similar News

A Polish mathematician explains the behind-the-scenes of the AI ​​exam. "It took 13 pages to solve just one problem."

Similar News

Russian troops were overwhelmed by the abundance on the outskirts of Kiev. Hence the cruelty?

Specifically for the Polish Army, WB Group announces a drone revolution on land and sea.

A collapse in payouts. This hasn't happened in over four years.

They want to extract significant amounts of an extremely important raw material. This will change the market.

A daily travel record was broken. An airport in Turkey handled 230,000 and two passengers.

A Polish mathematician explains the behind-the-scenes of the AI exam. "It took 13 pages to solve just one problem."

A Polish mathematician explains the behind-the-scenes of the AI exam. "It took 13 pages to solve just one problem."